A METHOD FOR OPTIMAL IMAGE SUBTRACTION. 
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ABSTRACT 

We present a new method designed for optimal subtraction of two images with different seeing. Using 
image subtraction appears to be essential for the full analysis of the microlensing survey images, however 
a perfect subtraction of two images is not easy as it requires the derivation of an extremely accurate 
convolution kernel. Some empirical attempts to find the kernel have used the Fourier transform of bright 
stars, but solving the statistical problem of finding the best kernel solution has never really been tackled. 
We demonstrate that it is possible to derive an optimal kernel solution from a simple least square analysis 
using all the pixels of both images, and also show that it is possible to fit the differential background 
variation at the same time. We also show that PSF variations can also be easily handled by the method. 
To demonstrate the practical efficiency of the method, we analyzed some images from a Galactic Bulge 
field monitored by the OGLE II project. We find that the residuals in the subtracted images are very 
close to the photon noise expectations. We also present some light curves of variable stars, and show 
that, despite high crowding levels, we get an error distribution close to that expected from photon noise 
alone. We thus demonstrate that nearly optimal differential photometry can be achieved even in very 
crowded fields. We suggest that this algorithm might be particularly important for microlensing surveys, 
where the photometric accuracy and completeness levels could be very significantly improved by using 
this method. 



1. INTRODUCTION 

The search for microlensing events towards the LMC 
MACHO (Alcock et al. 1993), EROS (Aubourg et al. 
1993) the Galactic Bulge OGLE (Udalski et al. 1994) 
, MACHO, DUO (Alard & Guibert 1997) or the M31 
Galaxy, AGAPE (Ansari et al. 1997), has provided us 
with an impressive database of images of densely crowded 
fields. The target fields have been monitored for several 
seasons, providing us with time series containing hundreds 
of images. Light curves for millions of stars can then be 
easily obtained with one of the widely used profile fitting 
codes such as DoPHOT (Schechter & Mateo, 1993). The 
search for variable objects among these huge light curve 
databases has proved very fruitful, for microlensing (MA- 
CHO, OGLE, DUO, EROS), and also for variable stars 
(MACHO, OGLE, DUO, EROS). However we would hke 
to emphasize that photometry and detection of variable 
(including moving) objects should be based on the dif- 
ference between frames, whereas photometric codes like 
DoPHOT are designed to perform profile fitting photom- 
etry of stars detected on a reference frame. In a variable 
object appears but was not seen on the reference it won't 
be detected, leading to a serious loss of efficiency for mi- 
crolensing. The completeness of the variable star cata- 
logue will be also seriously affected. Another concern is 
that of photometric accuracy. With multi-profile fitting 
techniques, the absolute photometry of a given (crowded) 
star requires perfect PSF estimation and careful modeling 
of all other close components, and also a correct estimate 
of the background value around each star. For the par- 
ticular application of finding light curves of variable ob- 
jects, it is more efficient to estimate only that part of the 
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star's brightness which varies from image to image; this 
is exactly the problem that image subtraction is designed 
to solve. The first attempt to perform image subtraction 
was made by Tomaney & Crotts 1996, (hereafter TC) for 
data taken towards the M31 Galaxy (Crotts & Tomaney 
1996). To make a perfect subtraction of two images, one 
has to match the frames to exactly the same seeing. TC 
proposed degrading a good seeing image to match a refer- 
ence frame with bad seeing. The quality achieved in the 
subtracted image is very dependent of the quality of the 
kernel determination, and finding the proper kernel is a 
very delicate operation. TC proposed deriving the ker- 
nel by simply taking the ratio of the Fourier transform of 
a bright star on each image. However the high frequen- 
cies are dominated by noise and they were forced to use a 
Gaussian extrapolation to determinate the wings (Phillips 
and Davies 1995). This method provides no guarantee of 
producing the highest attainable quality of the subtracted 
image. Even apart from the non-Gaussian wings of the 
true kernel, and the limited number of bright, uncrowded, 
stars with sufficient signal to noise ratio, this method is 
non-optimal in the sense that it does not use all the in- 
formation available: in fact every star, even if extremely 
crowded, contains information about the kernel; to get 
an optimal solution we must use all of that information. 
Additionally their method has difficulty with rapid, com- 
plicated, PSF variations, and does not intrinsically handle 
background subtraction. The problem that we address 
here is how to find an optimal kernel solution, in order to 
get the best possible subtracted image. 



2. THE METHOD. 



2.1. Preliminaries. 

Before looking for the optimal subtraction, we need to 
perform some basic operations, to register the frames to a 
reference frame. Usually the frames have slightly different 
centers and orientation (and possibly scale), and we need 
to perform an astrometric transform to match the coordi- 
nates of the reference frame. We determine this transform 
by fitting a two dimensional polynomial using 500 stars 
on the reference frame, and the same number on the other 
frame. Using this transform we then resample the frame 
on the grid defined by the reference frame. This resam- 
pling is performed by interpolating using bicubic splines, 
which gives excellent accuracy. All the frames are then 
on the same coordinate system, and we can proceed to 
matching the seeing. 

2.2. The reference frame. 

Here we emphasize that, contrary to TC, we choose to 
take the best seeing frame as the reference. We do not 
wish to degrade the frame to the worst seeing frame, as 
this will clearly lower the signal to noise ratio. Later, we 
will match the seeing in our frames by convolving the ref- 
erence to the seeing of each other frame. This is likely to 
be more difficult, as aligning to good seeing frames is more 
difficult, but we are looking for an optimal result. 

2.3. Seeing alignment to the reference. 

We now arrive to the fundamental problem of matching 
the seeing of two frames with different PSFs. We do not 
want to make any assumption concerning the PSF on the 
frame, and we plan to use all the pixels. The important 
point is that most of the stars on a given frame do not 
have large amplitude variations, but variations of at most 
1 or 2 %. This allows us to say that most of the pixels on 
two frames of the same field would be very similar, if the 
seeing were the same. Consequently, one possibility is to 
try to find the kernel by finding the least square solution 
of the equation: 



V, 



Rei{x, y) ® Kernel(u, v) ~ I{x, y) 



(1) 



Where Ref is the reference image, and I the image to align. 
The symbol ® denotes convolution. In principle solving 
this equation is a non-linear problem, for which a real- 
istic computer solution looks impossible. However if we 
decompose our kernel using some basis of functions, the 
problem becomes a standard linear least square problem. 
If we decompose the kernel as: 



Kerncl(u,w) = > Oj x Bi{u,v), 



(2) 



solving the least square gives the following Matrix equa- 
tion for the ai coefficients: 



Ma = V 



Rei{x,y) x Ci{x,y)/a{x,y)'^ dxdy 



C,{x,y) = I{x,y)®Bi{x,y) 

In choosing to solve the problem by least-squares we've 
implicitly approximated the images Poisson statistics with 
Gaussian distributions with variance a{x,y)'^: 



a{x,y) = kx ^/l{x,y) 

We set the constant k by taking into account the detec- 
tor's gain (ratio of photons detected to ADU). Note that 
the matrix M is just the scalar product of the set of vectors 
Ci, and the vector V the scalar product of the Ci with I. 
All we have to do now is to look for a suitable basis of 
functions to model the kernel. The functions of this basis 
must have finite sums, and must drop rapidly beyond a 
given distance (the size of an isolated star's image). To 
solve this problem, we start with a set of Gaussian func- 
tions, which we modify by multiplying with a polynomial. 
These basis functions allow us to model the kernel, even if 
its shape is extremely complicated. We adopt the follow- 
ing decomposition: 



Kernel(M, ") = X! X! X! 



X e-(-'+-')/2-i^ „< y'^l 



where < d^ < £)„, < dj^ -I- rf^ < £)„, and I?„ is the 
degree of the polynomial corresponding to the n"^ Gaus- 
sian component. There are a total of (_D„ -I- l){Dn + 2)/2 
terms for each value of n. 
In the notation of eq (2), 



B{u,v) 



^ ^-{u'+v')/2al 



.dl 



In practice, it seems that 3 Gaussian components with 
associated polynomial degrees in the range 2 to 6 can 
give subtracted images with residuals comparable to a/2 x 
photon noise. 

2.4. Differential background subtraction. 

Another important issue is that the differential back- 
ground variation between the frames can be fitted simul- 
taneously with the kernel. In eq (1) we did not considered 
any background variations between the two frames; let's 
modify eq (1) in the following way: 

Ref(a;, y) ® Kernel(a;, y) = I{x, y) + bg{x, y) (3) 

We shall use the following polynomial expression for 
bg(,x,y): 

bg{x,y) ^^^Oix'y^ 



with 0<i<D''9,0<i+j<D''9, and D''^ is the degree of 
the polynomial used to model the differential background 
variation. The least square solution of eq (3), will lead 
to a matrix equation similar to the previous one, except 
that we have to increase the number of Ci vectors; our 
definitions of the matrix M and vector V relative to the 



With: 



Ci remain the same as in section 2.3. We have 



M,j = / Ci{x,y) X Cj{x,y)/(j{x,yf 



dxdy 



Ci{x,y) = 



x^ y*-' 



I{x,y)<Si Bi{x,y), 



if i = ■ ■ ■ ribg — 1 
if i ^ Ubg-- -Ubg +n 
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where Ubg = (£>''» + !)(£>''» + 2)/2 - 1 and n = Ej(-Dj + 
l)(Dj + 2)/2; note that, for « >= ni,g, the Ci are identical 
to our previous results. 

3. TAKING INTO ACCOUNT THE PSF VARIATIONS. 

There are two ways to handle the problem of PSF vari- 
ations. Firstly, most of the time, the field is so dense that 
a transformation kernel can be determined in small areas, 
small enough that we can ignore the PSF's variation. This 
is the great advantage of a method which does not require 
any bright isolated stars to determine the kernel, but can 
be used on any portion of an image, provided that the 
signal to noise is large enough to determine the kernel. In- 
deed, the more crowded the field the easier it is to model 
variations of the PSF. A second possibility is to make an 
analytical model of the kernel variations. We take the fol- 
lowing kernel model: 



Kernel(a;, y,u,v) = H XIH XIH [' 



5= gy 

a-nX y 



n dZ dl 5=" 5v 



X e-(«^+-^)/2-^ «< yK 



Where: < J^ < D'^, < (5^ + 5^ < D^ , and D^ is the 
degree of the polynomial transform that we use to fit the 
kernel variations. Provided that the kernel variations with 
X and y are small enough compared to the u,v variations, 
we can easily calculate new expressions for the C^s: 



C^{x,y)-{ \ly-)^Bo,{x,y), 



if i = Q ■ ■ ■ Hbg — 1 
if i = Ubg-- -Ubg +n 



where 



Boi{u,v) = Bi{u,v) X u^ v^ 



with the values of 6x and 6y implicit in the index i, and 
now n = EjiDj + l){Dj + 2)/2) x (D'= + 1){D'' + 2)/2. 
Unfortunately these equations do not guarantee the con- 
servation of fiux. Consequently we must add the condition 
that the sum of the kernel has to be constant. To simplify 
the equation we also normalize the Bo functions, so that 
each of them sums to one. We can then rewrite the kernel 
decomposition: 

Kernel(a;, j/,u, w) = ^ ^ aijx, y) x [Boi{u,v) — Bni{u,v)] 



i=0 



norm x Bon 



We can calculate the norm (the sum of the kernel) by mak- 
ing a constant PSF fit in several small area. The different 
values will then be averaged to get the constant norm. The 
solution of the system for the coefficients a„ is very sim- 
ilar to the previous case of a constant PSF. We shall not 
bother to give all the the details here. 

4. APPLICATION OF THE METHOD TO OGLE DATA. 

The OGLE team has kindly provided us with a stack of 
images of a field situated 2 degrees from the Galactic Cen- 
ter, in order to experiment with our method. For these 



particular images, the optimal kernel has a complicated 
shape and it would be probably be very difficult to com- 
pute reliably with a simple Fourier division; we consider 
this field an excellent test of our method. The data was 
taken in drift scan mode (TDI), so the form of the PSF can 
vary rapidly with row number on the CCD. We extracted a 
smaU (500 x 1000) sub-frame from the 2048 x 8192 original 
images. One of the images has quite outstanding seeing, 
and we took it as a reference. All frames were resampled 
to the reference grid by using the method previously de- 
scribed. To model the kernel, we took 3 Gaussian compo- 
nents with associated polynomials. For the first Gaussian 
we took a = 1 pixels and a = 3 and cr = 9 for the two 
others. The degree of the associated polynomials were re- 
spectively 6, 4, and 2. We divided the sub frame into 
128 X 256 pixels regions. We applied our method to each 
of these regions, which provided us with one subtracted 
image per region. We reconstructed the subtracted image 
of our whole sub-frame by mosaicing the subtracted im- 
ages obtained for each region. In this set of 86 images, the 
seeing varies from 0.7 arcsec to 2.5 arcsec, and some of the 
frames have elongated stellar images. We started by mak- 
ing an initial residual image using all unsaturated pixels. 
We then made a 3 ct rejection of the pixel list, to get rid of 
the variables. We usually used 4 iterations of the method, 
to be completely unbiased by large amplitude variables. 
We found that for all images, the final residual calculated 
from the subtracted image was very close to that expected 
from Poisson statistics. To illustrate this result we plot 
in Fig. 1 the initial images and the subtracted image for 
a small field containing a variable star at its center. The 
stellar images are sharply peaked on the reference, while 
they look quite fuzzy and assymetric on the other image. 
This is well confirmed by the shape of the best convolu- 
tion kernel which looks elongated and has a complicated 
shape. This example clearly illustrate the ability of our 
method to deal with any kernel shape. We can imagine 
that in this case any Gaussian approximation of the ker- 
nel itself or of its Fourier transform would not be satis- 
factory. For illustrative purposes, we also normalized the 
subtracted image by the sum of the photon noise expected 
from the two images (see Fig. 2). Once this normaliza- 
tion is applied, we see that the larger deviations visible at 
the location of the bright stars disappear, suggesting that 
the subtraction errors correspond to Poisson noise. This is 
confirmed by calculating the reduced chi squared: we find 
X^/j/ = 1.05 (before doing this calculation we removed a 
small area around the variable star at center of the image). 
We also plot the histogram of the normalized deviations 
in Fig. 2. This histogram is very close to a Gaussian with 
zero mean and unit variance (i.e. N(0,1)). We observe de- 
viations significantly larger than the Poisson expectations 
only for very bright stars (about 5 to 10 times brighter 
than the brightest stars in the small field we present). We 
believe that these residuals are due to seeing variations, 
see section 5 for more details; the number of such bright 
stars in an image is very small. To spot the variables stars, 
we created a "deviation image" by co-adding the square 
of the subtracted images. We normalized the deviation 
image by normalizing with the pixels standard deviations. 
We found many variables at very significant levels. Most 
of them seem to be bright giants with small amplitudes. 




Fig. 1. — Example of subtracted image. The two bottom figures of the panel are the original images. On the right is 
the reference image, and on the left is the image to be fitted by kernel convolution. The two upper figures show the best 
kernel solution on the right and on the left the subtracted image. Note the complicated shape of the kernel. 
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Fig. 2. — Noise in the subtracted image. The image on left is the subtracted image normalized by the Poisson deviations 
of both the reference and the image for the small field presented at previous figure. On right we show the histogram 
of the pixels in this image. We sumperimposed to this histogram a gaussian of variance 1 (dashed line). Note that the 
deviations due to the bright stars are no longer visible. The variable star at center clearly stands out at very significant 
level. The dark pixel in the image is a cosmic. 
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Fig. 3. — Examples of bright giants light curve. The x-axis are days, the y-axis are percentage variation (with respect to 
reference frame). Errors bars are derived from the Poisson deviations associated to each image, we do not include here 
the deviations associated with the reference image. 
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Fig. 4. — The periodic variables. The x-axis represent the phase , the y-axis are percentage variation (with respect to 
reference frame). 
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Some of the variables appeared to be periodic, we found 
a few RR Lyraes and some eclipsing variables. We com- 
puted the flux variations for these stars by making simple 
aperture photometry. In Fig. 3 and Fig. 4 we give an 
illustration of the result we have obtained. 

5. SOURCES OF NOISE IN THE RESIDUAL IMAGE. 

As discussed above, the variance of the residual image is 
approximately equal to the sum of the variances of the in- 
put images. If we created a reference image by co-adding a 
large number of good-seeing images we could remove the 
contribution of the noise in the reference; we would, of 
course, have to be careful about variability between the 
different reference frames. Upon inspection of the residual 
images, however, some showed significantly larger residu- 
als than expected from Poisson statistics near the position 
of bright, but non-saturated, stars (it does represent less 
than 1 % of the stars visible on the frame). These showed 
the characteristic signature of centering errors, with equal 
positive and negative residuals even for stars which show 
no evidence of variability (i.e. the sum of all residuals 
within a few arc-seconds is zero). We believe that this 
is produced by the turbulent atmosphere modulating our 
Kernel on the scale of our sub-regions. 

Shao and Colavita (1992) quote the variance in the angle 
between two stars separated by 9 as 



-I 



5.25 (6l/radian)^/^ (t/sec)"^ / Cl{h)h'^/^V-\h) dh 



for the regime in which we are interested (their equation 
2). They evaluate the integral using data from Roddier 
et al. (1990) for a night on Mauna Kea with « 0.5arcsec 
seeing to give 



crs 



l.l(6l/radian)^/^(i/scc)"^/^ 



arcsec 



If we assume that the integral over the atmosphere scales 
with seeing in a similar way to the integral 



Cl{h)dh 



which enters into the definition of the Fried parameter, tq, 
we may expect that this result will scale as (roA^^/^)^^/^ 
(a result which is independent of A due to the wavelength 
dependence of rg). In 1 arcsec seeing, therefore, we may 
expect that 



CT5 



2(6i/radians)^/^(i/scc)"^/^ 



arcsec 



On the typical scale of our 128 x 256 regions, and for 
128s exposures, this corresponds to an RMS image mo- 
tion of about O.Ollarcsec. If we model the PSF as a 
Gaussian with width parameter a (a « 0.424 for larcsec 
FWHM images), this would produce a maximum residual 
of cr5/aexp(— 1/2), or 1.6%. This is of the same order as 
the residuals that we see in our frames. 

6. HARMONIC FITTING TO THE PERIODIC VARIABLES. 

We expect that the periodic variables light curves to be 
well approximated with truncated Fourier series. We cal- 
culate the period using the Renson method (1978), and 



we fit Fourier series with different number of harmonics. 
The errors are calculated from the photon noise in each 
image. We do not include the noise associated with the 
reference image because it is produces an error only in 
the total magnitude, and, to first order, doesn't affect the 
variable part of the object's flux. We estimate each time 
the chi-square per degree of freedom (x^), and we look for 
the best chi-square with the minimum number of harmon- 
ics. The results are given in table 1 where we see that the 
resulting value of Xd is close to unity, for most variables. 
Except for variable PI our mean error is at most only 25 
% larger than the Poisson expectation (i.e. Xd < 1-56); 
of course, this Xd excess is significant. In the case of the 
variable PI the Xd is very inconsistent with the Poisson 
expectation. This variable has about the same brightness 
as P6. We checked the quality of the subtracted images, 
but could not identify any defects. The quality of the im- 
age subtraction is as good for PI and for P6, they have 
about the same brightness, so what's wrong ? Considering 
that the mean error is fairly small (about 1%), we might 
suspect some residual error due to flat fielding. However, 
we get a mean residual of only 0.6 % for P6 and Xd — 1-1 
showing that the fiat fielding errors are much smaller than 
0.6 %. This is not surprising because these images were 
taken in drift scan mode, and consequently, we average the 
sensitivity of many pixels. We conclude that there must 
be some intrinsic reason for Pi's bad Xd- It is possible that 
variables do not repeat perfectly from cycle to cycle. This 
kind of variable star is well known to have spots which are 
likely to induce variability at the sub percent level. It is 
also possible that the RR Lyraes don't repeat perfectly, 
they are well known to show the Blashko effect, and we 
can explain some of the Xd ^s being due to cycle to cycle 
variations. Although estimating the Xd of periodic variable 
stars is not an absolute test, we conclude that on average 
we are only about 20 % above the Poisson error, and con- 
sequently there is not much to be gained from improving 
our method. However, we must note that the errors due to 
the reference frame are the same for the integrated fiux of a 
star on each image only at first order of approximation. By 
convolving the reference each time to fit the seeing varia- 
tions, we change slightly the noise distribution around the 
star. Especially for the case where a bright star is close to 
our object, convolving with the kernel might spread some 
noise into our photometric aperture. This effect will be 
negligible for good seeing frames, but noticeable when the 
seeing's bad. An obvious solution is to construct a refer- 
ence with a signal to noise as good as possible by stacking 
the best seeing images; see the next section. Another ap- 
proach with potential to improve the signal-to-noise would 
be to use a matched filter to measuring our stars variabil- 
ity. Unfortunately, simply applying the usual PSF-filter 
leads to problems with aperture corrections, and we shall 
not investigate this approach in this paper. 



7. IMPROVING THE REFERENCE FRAME. 

We averaged the 20 best seeing images to build a ref- 
erence frame with excellent signal to noise. The resulting 
seeing is of course not as good as it was in our previous 
reference which was the best image. But the seeing vari- 
ations are much reduced, as well as the noise amplitude. 



Table 1: Harmonics fitting to the periodic variables. In column 3 we give the value of the mean residual to the fit. It is 
useful to compare this residual to possible flat fielding errors. Wc also give an estimate of the star magnitude difference 
to the RR Lyrae Amag- We assume that the RR Lyrae have all the same mean magnitude. A crude estimate for the RR 
Lyrae mean magnitude in this field is I ~ 17. 



Variable 



Xd 



Mean Residual (%) A^ 



1 


2.01 


1.0 


2 


1.43 


1.1 


3 


1.55 


1.6 


4 


1.46 


1.2 


5 


1.17 


1.3 


6 


1.1 


0.6 



-1.117 

-0.4402 







-0.4348 



Tabic 2: Harmonics fitting to the light curves obtained with the new reference. See table 1 for the meaning of columns. 



Variable 



Xd 



Mean Residual (%) 



PI 


2.0 


1.0 


P2 


1.16 


1.0 


P3 


1.27 


1.45 


P4 


1.45 


1.2 


P5 


1.15 


1.3 


P6 


1.03 


0.6 



All the images were reprocessed using this new reference. 
We found that all the subtracted frames were improved. 
Even for the good seeing frame, were the seeing quality of 
the reference is critical we found some improvements. The 
light curves of the variables stars were also improved, we 
give the result of harmonic fitting in table 2. 

8. COMPUTING TIME. 

One might think that a method which fits all the pixels 
in an image (even if the fit is linear) is going to be much 
more time consuming than conventional methods. But the 
actual cost of the calculations is much lighter than might 
appear at first glance. Most of the computing time is taken 
by the calculation of the matrix wc define in section 2.3 
this is an iV^ process (N is the number of basis function 
we use). The rest of the calculation is an N process. The 
matrix could be calculated once for all and used to fit 
the kernel solution for all images. A problem with this 
approach is that we reject different pixels on each frame 
(due to new saturated pixels, or variable stars) and conse- 
quently the matrix elements change. In practice, we find 
that we reject no more than 1 % percent of the total num- 



ber of pixels, so that all that we have to do is to calculate 
the matrix elements for the rejected pixels, and subtract 
them from the original values. This process cost very little 
CPU, and once the original matrix has been built, the ker- 
nel solution can be fitted very quickly even though we use 
several clipping passes. The rest of the operations requires 
about the same computing time. By applying this method 
we can process a 1024 x 1024 frame in about 1 min with a 
200 Mhz PC; this could certainly be improved further by 
using better numerical algorithms for the solution of the 
linear system. 
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